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Abstract 

We present algorithms that learn certain classes of function-free recursive logic pro- 
grams in polynomial time from equivalence queries. In particular, we show that a single 
fc-ary recursive constant-depth determinate clause is learnable. Two-clause programs con- 
sisting of one learnable recursive clause and one constant-depth determinate non-recursive 
clause are also learnable, if an additional "basecase" oracle is assumed. These results im- 
mediately imply the pac-learnability of these classes. Although these classes of learnable 
recursive programs are very constrained, it is shown in a companion paper that they are 
maximally general, in that generalizing either class in any natural way leads to a compu- 
tationally difficult learning problem. Thus, taken together with its companion paper, this 
paper establishes a boundary of efficient learnability for recursive logic programs. 

1. Introduction 

One active area of research in machine learning is learning concepts expressed in first- 
order logic. Since most researchers have used some variant of Prolog to represent learned 
concepts, this subarea is sometimes called inductive logic programming (ILP) (Muggleton, 
1992; Muggleton k De Raedt, 1994). 

Within ILP, researchers have considered two broad classes of learning problems. The 
first class of problems, which we will call here logic based relational learning problems, 
are first-order variants of the sorts of classification problems typically considered within 
AI machine learning community: prototypical examples include Muggleton et a/.'s (1992) 
formulation of a-helix prediction, King et a/.'s (1992) formulation of predicting drug ac- 
tivity, and Zelle and Mooney's (1994) use of ILP techniques to learn control heuristics for 
deterministic parsers. Logic-based relational learning often involves noisy examples that re- 
flect a relatively complex underlying relationship; it is a natural extension of propositional 
machine learning, and has already enjoyed a number of experimental successes. 

In the second class of problems studied by ILP researchers, the target concept is a Prolog 
program that implements some common list-processing or arithmetic function; prototypical 
problems from this class might be learning to append two lists, or to multiply two numbers. 
These learning problems are similar in character to those studied in the area of automatic 
programming from examples (Summers, 1977; Biermann, 1978), and hence might be ap- 
propriately called automatic logic programming problems. Automatic logic programming 
problems are characterized by noise-free training data and recursive target concepts. Thus a 
problem that is central to the enterprise of automatic logic programming — but not, perhaps, 
logic-based relational learning — is the problem of learning recursive logic programs. 
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The goal of this paper is to formally analyze the learnability of recursive logic programs 
in Valiant's (1984) model of pac-learnability, thus hopefully shedding some light on the 
task of automatic logic programming. To summarize our results, we will show that some 
simple recursive programs are pac-learnable from examples alone, or from examples plus a 
small number of additional "hints". The largest learnable class we identify in a standard 
learning model is the class of one-clause constant-depth determinate programs with at most 
a constant number of "closed" recursive literals. The largest learnable class we identify 
that requires extra "hints" is the class of constant-depth determinate programs consisting 
of a single nonrecursive base clause and a single recursive clause from the class described 
above. All of our results are proved in the model of identification from equivalence queries 
(Angluin, 1988, 1989), which is somewhat stronger than pac-learnability. Identification from 
equivalence queries requires that the target concept be exactly identified, in polynomial 
time, and using only a polynomial number of equivalence queries. An equivalence query 
asks if a hypothesis program H is equivalent to the target program C; the answer to a 
query is either "yes" or an adversarily chosen example on which H and C differ. This 
model of learnability is arguably more appropriate for automatic logic programming tasks 
than the weaker model of pac-learnability, as it is unclear how often an approximately 
correct recursive program will be useful. 

Interestingly, the learning algorithms analyzed are different from most existing ILP 
learning methods; they all employ an unusual method of generalizing examples called forced 
simulation. Forced simulation is a simple and analytically tractable alternative to other 
methods for generalizing recursive programs against examples, such as ra-th root finding 
(Muggleton, 1994), sub-unification (Aha, Lapointe, Ling, & Matwin, 1994) and recursive 
anti-unification (Idestam-Almquist, 1993), but it has been only rarely used in experimental 
ILP systems (Ling, 1991). 

The paper is organized as follows. After presenting some preliminary definitions, we 
begin by presenting (primarily for pedagogical reasons) a procedure for identifying from 
equivalence queries a single non-recursive constant-depth determinate clause. Then, in 
Section 4, we extend this learning algorithm, and the corresponding proof of correctness, 
to a simple class of recursive clauses: the class of "closed" linear recursive constant-depth 
determinate clauses. In Section 5, we relax some assumptions made to make the analysis 
easier, and present several extensions to this algorithm: we extend the algorithm from linear 
recursion to A;-ary recursion, and also show how a A;-ary recursive clause and a non-recursive 
clause can be learned simultaneously given an additional "basecase" oracle. We then discuss 
related work and conclude. 

Although the learnable class of programs is large enough to include some well-known 
automatic logic programming benchmarks, it is extremely restricted. In a companion paper 
(Cohen, 1995), we provide a number of negative results, showing that relaxing any of these 
restrictions leads to difficult learning problems: in particular, learning problems that are 
either as hard as learning DNF (an open problem in computational learning theory), or as 
hard as cracking certain presumably secure cryptographic schemes. Thus, taken together 
with the results of the companion paper, our results delineate a boundary of learnability 
for recursive logic programs. 

Although the two papers are independent, we suggest that readers wishing to read both 
this paper and the companion paper read this paper first. 
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2. Background 

In this section we will present the technical background necessary to state our results. We 
will assume, however, that the reader is familiar with the basic elements of logic program- 
ming; readers without this background are referred to one of the standard texts, for example 
(Lloyd, 1987). 

2.1 Logic Programs 

Our treatment of logic programs is standard, except that we will usually consider the body 
of a clause to be an ordered set of literals. 

For most of this paper, we will consider logic programs without function symbols — 
i.e., programs written in Datalog. 1 The purpose of such a logic program is to answer 
certain questions relative to a database, DB, which is a set of ground atomic facts. (When 
convenient, we will also think of DB as a conjunction of ground unit clauses.) The simplest 
use of a Datalog program is to check the status of a simple instance. A simple instance 
(for a program P and a database DB) is a fact /. The pair (P, DB) is said to cover f iff 
DB API - /. The set of simple instances covered by (P, DB) is precisely the minimal model 
of the logic program P A DB. 

In this paper, we will primarily consider extended instances which consist of two parts: 
an instance fact f, which is simply a ground fact, and a description D, which is a finite set 
of ground unit clauses. An extended instance e = (/, D) is covered by (P, DB) iff 

DB A D A P h / 

If extended instances are allowed, then function-free programs are expressive enough to 
encode surprisingly interesting programs. In particular, many programs that are usually 
written with function symbols can be re- written as function-free programs, as the example 
below illustrates. 

Example. Consider the usual program for appending two lists. 
append( [] , Ys , Ys) . 

append([X|Xsl],Ys,[X|Zsl]) <- append(Xsl,Ys,Zsl). 

One could use this program to classify atomic facts containing function symbols 
such as append([l,2],[3],[l,2,3]). This program can be rewritten as a Datalog 
program that classifies extended instances as follows: 

Program P: 

append(Xs,Ys,Ys) <— 

null(Xs). 
append(Xs,Ys,Zs) <— 

components(Xs,X,Xsl) A 

components(Zs,X,Zsl) A 

1. This assumption is made primarily for convenience. In Section 5.2 we describe how this assumption can 
be relaxed. 
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append(Xsl,Ys,Zsl). 

Database DB: 

null (nil). 

The predicate components ( A, B,C) means that A is a list with head B and tail 
C; thus an extended instance equivalent to append([l,2],[3],[l,2,3]) would be 

Instance fact /: 

append(listl2,list3,listl23). 

Description D: 

component s(list 12,1, list 2). 
component s(list 2, 2, nil), 
component s(list 123,1, list 23). 
component s(list 23, 2, list 3). 
component s(list 3, 3, nil). 

We note that using extended instances as examples is closely related to using ground 
clauses entailed by the target clause as examples: specifically, the instance e = (f,D) is 
covered by P, DB iff P A DB h (/<— _D). As the example above shows, there is also a close 
relationship between extended instances and literals with function symbols that have been 
removed by "flattening" (Rouveirol, 1994; De Raedt & Dzeroski, 1994). We have elected 
to use Datalog programs and the model of extended instances in this paper for several 
reasons. Datalog is relatively easy to analyze. There is a close connection between Datalog 
and the restrictions imposed by certain practical learning systems, such FOIL (Quinlan, 
1990; Quinlan & Cameron- Jones, 1993), FOCL (Pazzani & Kibler, 1992), and GOLEM 
(Muggleton k Feng, 1992). 

Finally, using extended instances addresses the following technical problem. The learn- 
ing problems considered in this paper involve restricted classes of logic programs. Often, the 
restrictions imply that the number of simple instances is polynomial; we note that with only 
a polynomial-size domain, questions about pac-learnability are usually trivial. Requiring 
learning algorithms to work over the domain of extended instances precludes trivial learning 
techniques, however, as the number of extended instances of size n is exponential in n even 
for highly restricted programs. 

2.2 Restrictions on Logic Programs 

In this paper, we will consider the learnability of various restricted classes of logic pro- 
grams. Below we will define some of these restrictions; however, we will first introduce 
some terminology. 

If A<—B\ A . . . A B r is an (ordered) definite clause, then the input variables of the literal 
Bi are those variables appearing in B{ which also appear in the clause A<—B\ A ... A -B;-i; 
all other variables appearing in Bi are called output variables. Also, if A<—B\ A . . . A B r is a 
definite clause, then Bi is said to be a recursive literal if it has the same predicate symbol 
and arity as A, the head of the clause. 
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2.2.1 Types of Recursion 

The first set of restrictions concern the type of recursion that is allowed in a program. 
If every clause in a program has at most one recursive literal, then the program is linear 
recursive. If every clause in a program has at most k recursive literals, then the program is 
k-ary recursive. Finally, if every recursive literal in a program contains no output variables, 
then we will say that the program is closed recursive. 

2.2.2 Determinacy and Depth 

The second set of restrictions are variants of restrictions originally introduced by Muggleton 
and Feng (1992). If A<—B\ A . . . A B r is an (ordered) definite clause, the literal B{ is 
determinate iff for every possible substitution a that unifies A with some fact e such that 



there is at most one maximal substitution 9 so that DB h B{od. A clause is determinate 
if all of its literals are determinate. Informally, determinate clauses are those that can be 
evaluated without backtracking by a Prolog interpreter. 

We also define the depth of a variable appearing in a clause A<—B\ A . . . A B r as follows. 
Variables appearing in the head of a clause have depth zero. Otherwise, let B{ be the first 
literal containing the variable V, and let d be the maximal depth of the input variables of 
Bi] then the depth of V is d+1. The depth of a clause is the maximal depth of any variable 
in the clause. 

Muggleton and Feng define a logic program to be redeterminate if it is is determinate, 
of constant depth i, and contains literals of arity j or less. In this paper we use the phrase 
"constant-depth determinate" instead to denote this class of programs. Below are some 
examples of constant-depth determinate programs, taken from Dzeroski, Muggleton and 
Russell (1992). 

Example. Assuming successor is functional, the following program is determi- 
nate. The maximum depth of a variable is one, for the variable Cin the second 
clause, and hence the program is of depth one. 

less_than(A,B) <— successor(A,B). 

less_than(A,B) <— successor(A,C) A less_than(C,B). 



choose(A,B,C) <- 

zero(B) A 

one(C). 
choose(A,B,C) <- 

decrement(B,D) A 

decrement(A,E) A 



DB h B\a A ...AB % _ x a 




determinate and of depth 



two. 
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multiply(B,C,G) A 
divide(G,A,F) A 
choose(E,D,F). 

The program GOLEM (Muggleton & Feng, 1992) learns constant-depth determinate 
programs, and related restrictions have been adopted by several other practical learning 
systems (Quinlan, 1991; Lavrac & Dzeroski, 1992; Cohen, 1993c). The learnability of 
constant-depth determinate clauses has also received some formal study, which we will 
review in Section 6. 



2.2.3 Mode Constraints and Declarations 

We define the mode of a literal L appearing in a clause C to be a string s such that the initial 
character of s is the predicate symbol of L, and for j > 1 the j-th character of s is a "+" if 
the (j — l)-th argument of L is an input variable and a " — " if the (j — l)-th argument of L 
is an output variable. (This definition coincides with the usual definition of Prolog modes 
only when all arguments to the head of a clause are inputs. This simplification is justified, 
however, as we are considering only how clauses behave in classifying extended instances, 
which are ground.) A mode constraint is simply a set of mode strings R = {si, . . . , s^}, and 
a clause C is said to satisfy a mode constraint R for p if for every literal L in the body of 
C, the mode of L is in R. 

Example. In the following append program, every literal has been annotated 
with its mode. 



append(Xs,Ys,Ys) <— 

null(Xs). % mode: null+ 

append(Xs,Ys,Zs) <— 

components(Xs,X,Xsl) A % mode: components -\ 

components(Zs,X,Zsl) A % mode: components + + — 

append(Xsl,Ys,Zsl). % mode: append + + + 

The clauses of this program satisfy the following mode constraint: 

{ components -\ , components + + — , components + 

components — h + , components + + + , null+ 

append + + — , append -\ h, 

append — h + , append + ++ } 



Mode constraints are commonly used in analyzing Prolog code; for instance, they are 
used in many Prolog compilers. We will sometimes use an alternative syntax for mode 
constraints that parallels the syntax used in most Prolog systems: for instance, we may 
write the mode constraint 11 components -\ " as " components^ , —,—)". 

We define a declaration to be a tuple (p, a', R) where p is a predicate symbol, a' is an 
integer, and R is a mode constraint. We will say that a clause C satisfies a declaration if 
the head of C has arity a' and predicate symbol p, and if for every literal L in the body of 
C the mode of L appears in R. 
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2.3 A Model of Learnability 

In this section, we will present our model of learnability. We will first review the necessary 
definitions for a standard learning model, the model of learning from equivalence queries 
(Angluin, 1988, 1989), and discuss its relationship to other learning models. We will then 
introduce an extension to this model which is necessary for analyzing ILP problems. 

2.3.1 Identification From Equivalence Queries 

Let X be a set. We will call X the domain, and call the elements of X instances. Define a 
concept C over X to be a representation of some subset of X, and define a language Lang 
to be a set of concepts. In this paper, we will be rather casual about the distinction between 
a concept and the set it represents; when there is a risk of confusion we will refer to the set 
represented by a concept C as the extension of C. Two concepts C\ and C'2 with the same 
extension are said to be (semantically) equivalent. 

Associated with X and Lang are two size complexity measures, for which we will use 
the following notation: 

• The size complexity of a concept C £ Lang is written ||C||. 

• The size complexity of an instance e £ X is written ||e||. 

• If S is a set, S n stands for the set of all elements of S of size complexity no greater 
than n. For instance, X n = {e £ X : ||e|| < n} and Lang„ = {C £ Lang : ||C|| < n}. 

We will assume that all size measures are polynomially related to the number of bits needed 
to represent C or e. 

The first learning model that we consider is the model of identification with equivalence 
queries. The goal of the learner is to identify some unknown target concept C £ Lang — 
that is, to construct some hypothesis H £ Lang such that H = C . Information about the 
target concept is gathered only through equivalence queries. The input to an equivalence 
query for C is some hypothesis H £ Lang. If H = C, then the response to the query is 
"yes". Otherwise, the response to the query is an arbitrarily chosen counterexample — an 
instance e that is in the symmetric difference of C and H . 

A deterministic algorithm Identify identifies Lang from equivalence queries iff for 
every C £ Lang, whenever Identify is run (with an oracle answering equivalence queries 
for C) it eventually halts and outputs some H £ Lang such that H = C. Identify 
polynomially identifies Lang from equivalence queries iff there is a polynomial poly(n t ,n e ) 
such that at any point in the execution of Identify the total running time is bounded by 
poly(nt,n e ), where n t = \C\ and n e is the size of the largest counterexample seen so far, or 
if no equivalence queries have been made. 

2.3.2 Relation to Pac-Learnability 

The model of identification from equivalence queries has been well-studied (Angluin, 1988, 
1989). It is known that if a language is learnable in this model, then it is also learnable 
in Valiant's (1984) model of pac-learnability. (The basic idea behind this result is that 
an equivalence query for the hypothesis H can be emulated by drawing a set of random 
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examples of a certain size. If any of them is a counterexample to H , then one returns 
the found counterexample as the answer to the equivalence query. If no counterexamples 
are found, one can assume with high confidence that H is approximately equivalent to the 
target concept.) Thus identification from equivalence queries is a strictly stronger model 
than pac-learnability. 

Most existing positive results on the pac-learnability of logic programs rely on showing 
that every concept in the target language can be emulated by a boolean concept from 
some pac-learnable class (Dzeroski et al., 1992; Cohen, 1994). While such results can be 
illuminating, they are also disappointing, since one of the motivations for considering first- 
order representations in the first place is that they allow one to express concepts that cannot 
be easily expressed in boolean logic. One advantage of studying the exact identification 
model and considering recursive programs is that it essentially precludes use of this sort of 
proof technique: while many recursive programs can be approximated by boolean functions 
over a fixed set of attributes, few can be be exactly emulated by boolean functions. 

2.3.3 Background Knowledge in Learning 

The framework described above is standard, and is one possible formalization of the usual 
situation in inductive concept learning, in which a user provides a set of examples (in 
this case counterexamples to queries) and the learning system attempts to find a useful 
hypothesis. However, in a typical ILP system, the setting is slightly different, as usually 
the user provides clues about the target concept in addition to the examples. In most ILP 
systems the user provides a database DB of "background knowledge" in addition to a set 
of examples; in this paper, we will assume that the user also provides a declaration. To 
account for these additional inputs it is necessary to extend the framework described above 
to a setting where the learner accepts inputs other than training examples. 

To formalize this, we introduce the following notion of a "language family". If Lang is 
a set of clauses, DB is a database and Dec is a declaration, we will define Lang[_D5, Dec] 
to be the set of all pairs (C, DB) such that C G Lang and C satisfies Dec. Semantically, 
such a pair will denote the set of all extended instances (f,D) covered by (C,DB). Next, 
if VB is a set of databases and DEC is a set of declarations, then define 

Lang[£>£,£>£C] = {Lang[_D5, Dec] : DB G VB and Dec G VEC} 

This set of languages is called a language family. 

We will now extend the definition of identification from equivalence queries to lan- 
guage families as follows. A language family Lang[X>£>, VEC] is identifiable from equivalence 
queries iff every language in the set is identifiable from equivalence queries. A language 
family Lang[X>£>, VEC] is uniformly identifiable from equivalence queries iff there is a single 
algorithm Identify(_D5, Dec) that identifies any language Lang[_D5, Dec] in the family 
given DB and Dec. 

Uniform polynomial identifiability of a language family is defined analogously: 
Lang[X>£>, DEC] is uniformly polynomially identifiable from equivalence queries iff there is a 
polynomial time algorithm Identify(_D5, Dec) that identifies any language Lang[_D5, Dec] 
in the family given DB and Dec. Note that Identify must run in time polynomial in the 
size of the inputs Dec and DB as well as the target concept. 
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2.3.4 Restricted Types of Background Knowledge 

We will now describe a number of restricted classes of databases and declarations. 

One restriction which we will make throughout this paper is to assume that all of the 
predicates of interest are of bounded arity. We will use the notation a-VB for the set of all 
databases that contain only facts of arity a or less, and the notation a-VEC for the set of 
all declarations (p, a', R) such that every string s £ R is of length a + 1 or less. 

For technical reasons, it will often be convenient to assume that a database contains an 
equality predicate — that is, a predicate symbol equal such that equal(ti,ti) £ DB for every 
constant t{ appearing in DB, and equal(ti,tj) (j£ DB for any t{ ^ tj. Similarly, we will 
often wish to assume that a declaration allows literals of the form equal(X,Y), where X 
and Y are input variables. If VB (respectively DEC) is any set of databases (declarations) 
we will use VB = {V8C = ) to denote the corresponding set, with the additional restriction 
that the database (declaration) must contain an equality predicate (respectively the mode 
equal( + , + )). 

It will sometimes also be convenient to assume that a declaration (p, a', R) allows only 
a single valid mode for each predicate: i.e., that for each predicate q there is in R only 
a single mode constraint of the form qa. Such a declaration will be called a unique-mode 
declaration. If DEC is any set of declarations we will use VEC 1 to denote the corresponding 
set of declarations with the additional restriction that the declaration is unique-mode. 

Finally, we note that in a typical setting, the facts that appear in a database DB and 
descriptions D of extended instances are not arbitrary: instead, they are representative of 
some "real" predicate {e.g., the relationship of a list to its components in the example above). 
One way of formalizing this is assume that all facts will be drawn from some restricted set T\ 
using this assumption one can define the notion of a determinate mode. If / = p(t\, . . . , tk) 
is a fact with predicate symbol p and pa is a mode, then define inputs(f,pa) to be the 
tuple iti 1 , . . . ,ti k ) , where i\, ij~ are the indices of a containing a "+". Also define 
outputs(f,pa) to be the tuple (tj t , . . . ,tj t ), where j\, . . . , j\ are the indices of a containing 
a " — ". A mode string pa for a predicate p is determinate for T iff the relation 

{(inputs( f,pa), outputs( f,pa)) : / £ J 7 } 

is a function. Informally, a mode is determinate if the input positions of the facts in T 
functionally determine the output positions. 

The set of all declarations containing only modes determinate for T will be denoted 
VetVECj^. However, in this paper, the set T will be assumed to be fixed, and thus we will 
generally omit the subscript. 

A program consistent with a determinate declaration Dec £ VetVEC must be deter- 
minate, as defined above; in other words, consistency with a determinate declaration is a 
sufficient condition for semantic determinacy. It is also a condition that can be verified with 
a simple syntactic test. 

2.3.5 Size Measures for Logic Programs 

Assuming that all predicates are arity a or less for some constant a also allows very simple 
size measures to be used. In this paper, we will measure the size of a database DB by its 
cardinality; the size of an extended instance (f,D) by the cardinality of D; the size of a 
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declaration (p, a', R) by the cardinality of R; and the size of a clause A<—B\ A . . . A B r by 
the number of literals in its body. 

3. Learning a Nonrecursive Clause 

The learning algorithms presented in this paper all use a generalization technique which 
we call forced simulation. By way of an introduction to this technique, we will consider a 
learning algorithm for non-recursive constant-depth clauses. While this result is presented 
primarily for pedagogical reasons, it may be of interest on its own: it is independent of 
previous proofs of the pac-learnability of this class (Dzeroski et al., 1992), and it is also 
somewhat more rigorous than previous proofs. 

Although the details and analysis of the algorithm for non-recursive clauses are some- 
what involved, the basic idea behind the algorithm is quite simple. First, a highly- 
specific "bottom clause" is constructed, using two operations that we call DEEPEN and 
CONSTRAIN . Second, this bottom clause is generalized by deleting literals so that it cov- 
ers the positive examples: the algorithm for generalizing a clause to cover an example is 
(roughly) to simulate the clause on the example, and delete any literals that would cause 
the clause to fail. In the remainder of this section we will describe and analyze this learning 
algorithm in detail. 

3.1 Constructing a "Bottom Clause" 

Let Dec = (p, a', R) be a declaration and let A<—B\ A . . . A B r be a definite clause. We 
define 



where Cb is a maximal set of literals Li that satisfy the following conditions: 

• the clause A<—B\ A ... A B r A Li satisfies the mode constraints given in R; 

• if Li G £d nas the same mode and predicate symbol as some other Lj G Co, then the 
input variables of Li are different from the input variables of Lj] 

• every Li has at least one output variable, and the output variables of Li are all 
different from each other, and are also difference from the output variables of any 
other Lj G £d- 

As an extension of this notation, we define DEEPEN % Dec (C) to be the result of applying 
the function DEEPEN Oec repeatedly i times to C, i.e., 



^ ecl J = \ DEEPEN Dec {DEEPEN l nl c {C)) otherwise 
We define the function CONSTRAIN Dec as 

CONSTRAIN Dec (A^B 1 A...AB r ) = A^B t A . . . A B r A( f\ Li) 

L t ec c 

where Cc is the set of all literals Li such that A<—B\ A ... A B r A Li satisfies the mode 
constraints given in R, and Li contains no output variables. 



DEEPEN Dec (A^B 1 A . . . A B r ) = A^B 1 A . . . A B r A ( /\ L t ) 




if i = 
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Example. Let DO be the declaration (p, 2,R) where R contains the mode 
constraints mother ( + , — ), father ( + , — ), raa/e( + ), female( + ), and equal ( + ,+). 
Then 

DEEPEN D0 (p(X,Y)^) = 

p(X,Y)<-mother(X,XM)Afather(X,XF)A mother(Y,YM)Afather(Y,YF) 

DEEPEN 2 D0 (p(X,Y)^) = DEEPEN D0 { DEEPEN do(p(X,Y)^)) = 
P(X,Y)^ 

mother(X,XM)Afather(X,XF)A mother(Y,YM)Afather(Y,YF)A 
mother(XM,XMM)Afather(XM,XMF)A mother(XF,XFM)Afather(XF,XFF)A 
mother(YM,YMM)Afather(YM,YMF)A mother(YF,YFM)Afather(YF,YFF) 

CONSTRAIN D0 { DEEPEN do(p(X,Y)^)) = 
P(X,Y)^ 

mother(X,XM)Afather(X,XF)A mother(Y,YM)Afather(Y,YF)A 

male(X)Afemale(X)Amale(Y)Afemale(Y)A 

male(XM)Afemale(XM)Amale(XF)Afemale(XF)A 

male(YM)Afemale(YM)Amale(YF)Afemale(YF)A 

equal(X,X)Aequal(X,XM)Aequal(X,XF)A 

equal(X,Y)Aequal(X,YM)Aequal(X,YF)A 

equal(XM,X)Aequal(XM,XM)Aequal(XM,XF)A 

equal(XM,Y)Aequal(XM,YM)Aequal(XM,YF)A 

equal(XF,X)Aequal(XF,XM)Aequal(XF,XF)A 

equal(XF,Y)Aequal(XF,YM)Aequal(XF,YF)A 

equal(Y,X)Aequal(Y,XM)Aequal(Y,XF)A 

equal(Y,Y)Aequal(Y,YM)Aequal(Y,YF)A 

equal(YM,X)Aequal(YM,XM)Aequal(YM,XF)A 

equal(YM,Y)Aequal(YM,YM)Aequal(YM,YF)A 

equal(YF,X)Aequal(YF,XM)Aequal(YF,XF)A 

equal(YF , Y) Aequal(YF , YM) Aequal(YF , YF) 

Let us say that clause C\ is a subclause of clause C'2 if the heads of C\ and C'2 are 
identical, if every literal in the body of C\ also appears in C'2, and if the literals in the 
body of C\ appear in the same order as they do in €2- The functions DEEPEN and 
CONSTRAIN allow one to easily describe a clause with an interesting property. 

Theorem 1 Let Dec = (p, a', R) be a declaration in a-VetV8C = , let X\, . . . , X a i be distinct 
variables, and define the clause BOTTOM * d as follows: 

BOTTOM* d (Dec) = CONSTRAIN Dec (DEEPEN d Dec {p(X 1 , . . ., X a /)<-)) 

For any constants d and a, the following are true: 

• the size of BOTTOM d (Dec) is polynomial in \\Dec\\; 

• every depth-d clause that satisfies Dec (and hence, is determinate) is (semantically) 
equivalent to some subclause of BOTTOM * d (Dec). 
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begin algorithm ForcelwR^d, Dec, DB): 

% below BOTTOM * d is the most specific possible clause 

let H <- BOTTOM* d (Dec) 

repeat 

Ans <— answer to the query "Is H correct?" 
if Ans ="yes" then return H 
elseif Ans is a negative example then 

return "no consistent hypothesis" 
elseif A ns is a positive example e + then 

% generalize H minimally to cover e + 

let (f,D) be the components of the extended instance e + 

H <- ForceSim NR (H,f, Dec, (DB U D)) 

ifH = FAILURE then 

return "no consistent hypothesis" 

endif 
endif 
endrepeat 

end 

Figure 1: A learning algorithm for nonrecursive depth-c? determinate clauses 
Proof: See Appendix A. A related result also appears in Muggleton and Feng (1992). ■ 



Example. Below C\ and D\ are equivalent, as are C'2 and Di- Notice that D\ 
and D 2 are subclauses of BOTTOM^(DO). 

d : p(A,B)^mother(A,C)Afather(A,D)A mother(B,C)Afather(B,D)Amale(A) 
£>i : p(X,Y)^mother(X,XM)Afather(X,XF)A mother(Y,YM)Afather(Y,YF)A 

male(X)Aequal(XM,YM)Aequal(XF,YF) 
C 2 : p(A,B)^father(A,B)Afemale(A) 
D 2 : p(X,Y)^father(X,XF)Afemale(X)Aequal(XF,Y) 

For C\ and D\, p(X, Y) is true when X is Y's brother. For C'2 and D2, p(X, Y) 
is true when J is Y's daughter, and Y is X's father. 

3.2 The Learning Algorithm 

Theorem 1 suggests that it may be possible to learn non-recursive constant-depth de- 
terminate clauses by searching the space of subclauses of BOTTOM * d in some efficient 
manner. Figures 1 and 2 present an algorithm called Force 1 mr that does this when Dec is 
a unique-mode declaration. 

Figure 1 presents the top-level learning algorithm, Force 1 mr. Force 1 takes as 
input a database DB and a declaration Dec, and begins by hypothesizing the clause 
BOTTOM * d (Dec). After each positive counterexample e + , the current hypothesis is gener- 
alized as little as possible in order to cover e + . This strategy means that the hypothesis is 
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begin subroutine ForceSirriNR(H,f, Dec, DB): 
% "forcibly simulate " H on fact f 
if / G DB then return H 

elseif the head of H and / cannot be unified then 
return FAILURE 

else 

let H' <- H 

let a be the mgu of / and the head of H' 
for each literal L in the body of H' do 

if there is a substitution a' such that Lao' G DB then 

a <— a o a' , where a' is the most general such substitution 

else 

delete L from the body of H' , together with 
all literals V supported (directly or indirectly) by L 

endif 
endfor 
return H' 
endif 

end 



Figure 2: Forced simulation for nonrecursive depth-c? determinate clauses 

always the least general hypothesis that covers the positive examples; hence, if a negative 
counterexample e~ is ever seen, the algorithm will abort with a message that no consistent 
hypothesis exists. 

To minimally generalize a hypothesis H , the function ForceSim^R is used. This sub- 
routine is shown in Figure 2. In the figure, the following terminology is used. If some 
output variable of L is an input variable of L' , then we say that L directly supports V . We 
will say that L supports V iff L directly supports L', or if L directly supports some literal 
L" that supports V . (Thus "supports" is the transitive closure of "directly supports".) 
ForceSirriNR deletes from H the minimal number of literals necessary to let H cover e + . To 
do this, ForceSirriNR simulates the action of a Prolog interpreter in evaluating H , except 
that whenever a literal L in the body of H would fail, that literal is deleted, along with all 
literals V supported by L. 

The idea of learning by repeated generalization is an old one; in particular, previous 
methods exist for learning a definite clause by generalizing a highly-specific one. For ex- 
ample, CLINT (De Raedt & Bruynooghe, 1992) generalizes a "starting clause" guided 
by queries made to the user; PROGOL (Srinivasan, Muggleton, King, & Sternberg, 1994) 
guides a top-down generalization process with a known bottom clause; and Rouveirol (1994) 
describes a method for generalizing bottom clauses created by saturation. The Force 1 mr al- 
gorithm is thus of interest not for its novelty, but because it is provably correct and efficient, 
as noted in the theorem below. 
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In particular, let c?-DepthNonRec be the language of nonrecursive clauses of depth 
d or less (and hence i- Depth No nRec j-VeWSC] is the language of nonrecursive in- 
determinate clauses). We have the following result: 

Theorem 2 For any constants a and d, the language family 

c?-DepthNonRec[X>,E> = , a-Veffl£C =1 ] 

is uniformly identifiable from equivalence queries. 

Proof: We will show that Force 1 mr uniformly identifies this language family with a polyno- 
mial number of queries. We begin with the following important lemma, which characterizes 
the behavior of ForceSim^R- 

Lemma 3 Let Dec declaration in VeW£C =1 , let DB be a database, let f be a fact, and let 
H be a determinate nonrecursive clause that satisfies Dec. Then one of following conditions 
must hold: 

• ForceSimNn(H ,/, Dec, DB) returns FAILURE, and no subclause H' of H satisfies 
both Dec and the constraint H' A DB h /; or, 

• ForceSimNn(H ,f ', Dec, DB) returns a clause H', and H' is the unique syntactically 
largest subclause of H that satisfies both Dec and the constraint H' A DB h /. 

Proof of lemma: To avoid repetition, we will refer to the syntactically maximal subclauses 
H' of H that satisfy both Dec and the constraint H' A DB h / as "admissible subclauses" 
in the proof below. 

Clearly the lemma is true if H or FAIL URE is returned by ForceSim . In the remaining 
cases the for loop of the algorithm is executed, and we must establish these two claims 
(under the assumptions that A and / unify, and that / ^ DB): 

Claim 1. If L is retained, then every admissible subclause contains L. 

Claim 2. If L is deleted, then no admissible subclause contains L. 

First, however, observe that deleting a literal L may cause the mode of some other 
literals to violate the mode declarations of Dec. It is easy to see that if L is deleted from 
a clause C , then the mode of all literals V directly supported by L will change. Thus if C 
satisfies a unique-mode declaration prior to the deletion of L, then after the deletion of L 
all literals V that are directly supported by L will have invalid modes. 

Now, to see that Claim 1 is true, suppose instead that it is false. Then there must 
be some maximal subclause C" of H that satisfies Dec, covers the fact /, and does not 
contain L. By the argument above, if C" does not contain L but satisfied Dec, then C" 
contains no literals V from H that are supported by L. Hence the output variables of L 
are disjoint from the variables appearing in C". This means that if L were to be added to 
C" the resulting clause would still satisfy Dec and cover /, which leads to a contradiction 
since C" was assumed to be maximal. 

To verify Claim 2, let us introduce the following terminology. If C = (A<—Bi A ... A B r ) 
is a clause and DB is a database, we will say that the substitution 9 is a (DB , f )-witness 
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for C iff 9 is associated with a proof that C A DB h / (or more precisely, iff A9 = f and 
Vi : 1 < i < r,B{9 £ DB.) We claim that the following condition is an invariant of the for 
loop of the ForceSiniNR algorithm. 

Invariant 1. Let C be any admissible subclause that contains all the literals in H' pre- 
ceding L (i.e., that contains all those literals of H that were retained on previous 
iterations of the algorithm). Then every (DB, /)- witness for C is a superset of a. 

This can be easily established by induction on the number of iterations of the for loop. The 
condition is true when the loop is first entered, since a is initially the most general unifier 
of A and /. The condition remains true after an iteration in which L is deleted, since a 
is unchanged. Finally, the condition remains true after an iteration in which L is retained: 
because a' is maximally general, it may only assign values to the output variables of T, and 
by determinacy only one assignment to the output variables of L can make L true. Hence 
every (DB, /)- witness for C must contain the bindings in a. 

Next, with an inductive argument and Claim 1 one can show that every admissible 
subclause C must contain all the literals that have been retained in previous iterations of 
the loop, leading to the following strengthening of Invariant 1: 

Invariant 1'. Let C be any admissible subclause. Then every (DB, /)- witness for C is a 
superset of a. 

Now, notice that only two types of literals are deleted: (a) literals L such that no superset 
of a can make L true, and (b) literals V that are supported by a literal L of the preceding 
type. In case (a), clearly L cannot be part of any admissible subclause, since no superset 
of a makes L succeed, and only such supersets can be witnesses of admissible clauses. In 
case (b), again V cannot be part of any admissible subclause, since its declaration is invalid 
unless L is present in the clause, and by the argument above L cannot be in the clause. 
This concludes the proof of the lemma. ■ 

To prove the theorem, we must now establish the following properties of the identification 
algorithm. 

Correctness. By Theorem 1, if the target program is in rf-DEPTHNoNREc[_D5, Dec], 
then there is some clause C't that is equivalent to the target, and is a subclause of 
BOTTOM* d (Dec). H is initially BOTTOM* d and hence a superclause of C't- Now consider 
invoking ForceSim^R on any positive counterexample e + . By Lemma 3, if this invocation 
is successful, H will be replaced by H' , the longest subclause of H that covers e + . Since 
C't is a subclause of H that covers e + , this means that H' will again be a superclause of 
C't- Inductively, then, the hypothesis is always a superclause of the target. 

Further, since the counterexample e + is always an instance that is not covered by the 
current hypothesis H , every time the hypothesis is updated, the new hypothesis is a proper 
subclause of the old. This means that Forcel nr will eventually identify the target clause. 

Efficiency. The number of queries made is polynomial in ||-Dec|| and since H is 

initially of size polynomial in ||-Dec||, and is reduced in size each time a counterexample is 
provided. To see that each counterexample is processed in time polynomial in n r , n e , and 
n t , notice that since the length of H is polynomial, the number of repetitions of the for 
loop of ForceSirriNR is also polynomial; further, since the arity of literals T is bounded by 
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a, only an^ + an e constants exist in DB U D, and hence there are at most (an& + an e ) a 
substitutions a' to check inside the for loop, which is again polynomial. Thus each execution 
of ForceSirriNR requires only polynomial time. 

This concludes the proof. ■ 



4. Learning a Linear Closed Recursive Clause 

Recall that if a clause has only one recursive literal, then the clause is linear recursive, 
and that if no recursive literal contains output variables, then the clause is closed linear- 
recursive. In this section, we will describe how the Force 1 algorithm can be extended to 
learn a single linear closed recursive clause. 2 Before presenting the extension, however, we 
would first like to discuss a reasonable-sounding approach that, on closer examination, turns 
out to be incorrect. 

4.1 A Remark on Recursive Clauses 

One plausible first step toward extending Force 1 to recursive clauses is to allow recursive 
literals in hypotheses, and treat them the same way as other literals — that is, to include 
recursive literals in the initial clause BOTTOM ^, and delete these literals gradually as 
positives examples are received. A problem with this approach is that there is no simple 
way to check if a recursive literal in a clause succeeds or fails on a particular example. This 
makes it impossible to simply run ForceSim^R on clauses containing recursive literals. 

A straightforward (apparent) solution to this problem is to assume that an oracle exists 
which can be queried as to the success or failure of any recursive literal. For closed recursive 
clauses, it is sufficient to assume that there is an oracle MEMBERc t (_D5, /) that answers 
the question 

Does DB API-/? 

where C't is the unknown target concept, / is a ground fact, and DB is a database. Given 
such an oracle, one can determine if a closed recursive literal L r should be retained by 
checking if MEMBERc T (_D5, T r a) is true. Such an oracle is very close to the notion of a 
membership query as used in computational learning theory. 

This is a natural extension of the Forcel^R learning algorithm to recursive clauses — in 
fact an algorithm based on similar ideas has been been previously conjectured to pac-learn 
closed recursive constant-depth determinate clauses (Dzeroski et al., 1992). Unfortunately, 
this algorithm can fail to return a clause that is consistent with a positive counterexample. 
To illustrate this, consider the following example. 

Example. Consider using the extension of Forcel^R described above to learn 
following target program: 

append(Xs,Ys,Zs) <— 

2. The reader may object that useful recursive programs always have at least two clauses — a recursive 
clause and a nonrecursive base case. In posing the problem of learning a single recursive clause, we are 
thus assuming the non-recursive "base case" of the target program is provided as background knowledge, 
either in the background database DB, or in the description atoms D of extended instances. 
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components(Xs,X,Xsl), 
components(Zs,Z,Zsl), 
X1 = Z1, 

append(Xsl,Ys,Zsl). 

This program is determinate, has depth 1, and satisfies the following set of 
declarations: 

comp onent s ( + , — , — ) . 
null( + ). 
equal( + ,+). 
odd( + ). 

append( + ,+,+). 

We will assume also a database DB that defines the predicate null to be true 
for empty lists, and odd to be true for the constants 1 and 3. 

To see how the forced simulation can fail, consider the following positive instance 
e = (/,£>): 

/ = append(U2, 13, 1123) 

D = { cons (11 23, 1,123), cons (123, 2, 13), cons (13, 3, nil), 
cons (112, 1,12), cons (12, 2, nil), 
append (nil, 13, 13) } 

This is simply a "flattened" form of append([l,2],[3],[l,2,3]), together with the 
appropriate base case append([],[3],[3]). Now consider beginning with the clause 
BOTTOM^ and generalizing it using ForceSim^R to cover this positive instance. 
This process is illustrated in Figure 3. The clause on the left in the figure is 
BOTTOM ^(Dec); the clause on the right is the output of forcibly simulating 
this clause on / with ForceSim^R- (For clarity we've assumed that only the 
single correct recursive call remains after forced simulation.) 

The resulting clause is incorrect, in that it does not cover the given example e. 
This can be easily seen by stepping through the actions of a Prolog interpreter 
with the generalized clause of Figure 3. The nonrecursive literals will all suc- 
ceed, leading to the subgoal append(l2 ,13 ,123) (or in the usual Prolog notation, 
append ([2], [3], [2, 3])). This subgoal will fail at the literal odd(Xl), because XI 
is bound to 2 for this subgoal, and the fact odd (2) is not true in DB U D. 

This example illustrates a pitfall in the policy of treating recursive and non-recursive 
literals in a uniform manner (For more discussion, see also (Bergadano & Gunetti, 1993; De 
Raedt, Lavrac, & Dzeroski, 1993).) Unlike nonrecursive literals, the truth of the fact L r a 
(corresponding to the recursive literal L r ) does not imply that a clause containing L r will 
succeed; it may be that while the first subgoal L r a succeeds, deeper subgoals fail. 
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BOTTOMl(Dec): 


ForceSim NR (BOTTOMl(Dec), f, Dec, DB U D) : 


append(Xs,Ys,Zs) <— 

components(Xs,Xl,Xsl)A 
components(Ys,Yl,Ysl)A 
cnmnrtTiPTitsi Zs Z1 Zs1 1A 

null(Xs)h 
null(Ys)A 

null(Ysl)A 
null(Zsl ), 
equal(Xs,Xs)A 


append(Xs,Ys,Zs) <— 

components(Xs,Xl,Xsl)A 
components(Ys,Yl,Ysl)A 
cnmnrtTiPTitsi Zs Z1 Zs1 1A 

null(Ysl)A 

equal(Xl,Zl)A 

odd(Xl)A 

odd(Yl)A 

odd(Zl)A 

append(Xsl,Ys,Zsl). 


equal(Xl,Zl)A 




equal(Zsl,Zsl )A 
odd(Xs)A 




odd(Xl)A 
odd(Yl)A 
odd(Zl)A 




odd(Zsl )A 

append ( Xs, Xs, Xs )A 




append(Zsl,Zsl,Zsl ). 





Figure 3: A recursive clause before and after generalization with ForceSimNR 



4.2 Forced Simulation for Recursive Clauses 

A solution to this problem is to replace the calls to the membership oracle in the algorithm 
sketched above with a call to a routine that forcibly simulates the actions of a top-down 
theorem-prover on a recursive clause. In particular, the following algorithm is suggested. 
First, build a nonrecursive "bottom clause", as was done in ForceSimNR. Second, find some 
recursive literal L r such that appending L r to the bottom clause yields a recursive clause 
that can be generalized to cover the positive examples. 

As in the nonrecursive case, a clause is generalized by deleting literals, using a straight- 
forward generalization of the procedure for forced simulation of nonrecursive clauses. During 
forced simulation, any failing nonrecursive subgoals are simply deleted; however, when a 
recursive literal L r is encountered, one forcibly simulates the hypothesis clause recursively 
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begin subroutine ForceSim(H ,f , Dec, DB , h): 
% "forcibly simulate" recursive clause H on f 
% 1 . check for infinite loops 
if h< then return FAILURE 
% 2. check to see if f is already covered 
elseif / G DB then return H 
% 3. check to see if f cannot be covered 
elseif the head of H and / cannot be unified then 
return FAILURE 

else 

let L r be the recursive literal of H 
let H' <— H - {L r } 

% 4- delete failing non-recursive literals as in ForceSimwR 

let A be the head of H' 

let a be the mgu of A and e 

for each literal L in the body of H' do 

if there is a substitution a 1 such that Lao' G DB 

then a <— a o a', where a' is the most general such substitution 

else 

delete L from the body of H', together with 
all literals V supported (directly or indirectly) by L 

endif 
endfor 

% 5. generalize H' on the recursive subgoal L r a 

if L r a is ground then return ForceSim(i7 / U {L r }, L r a, Dec, DB, h — 1) 
else return FAILURE 
endif 
endif 

end 



Figure 4: Forced simulation for linear closed recursive clauses 
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on the corresponding recursive subgoal. An implementation of forced simulation for linear 
closed recursive clauses is shown in Figure 4. 

The extended algorithm is similar to ForceSim^R, but differs in that when the recursive 
literal L r is reached in the simulation of H , the corresponding subgoal L r a is created, and 
the hypothesized clause is recursively forcibly simulated on this subgoal. This ensures that 
the generalized clause will also succeed on the subgoal. For reasons that will become clear 
shortly, we would like this algorithm to terminate, even if the original clause H enters an 
infinite loop when used in a top-down interpreter. In order to ensure termination, an extra 
argument h is passed to ForceSim. The argument h represents a depth bound for the forced 
simulation. 

To summarize, the basic idea behind the algorithm of Figure 4 is to simulate the hy- 
pothesized clause H on /, and generalize H by deleting literals whenever H would fail on 
/ or on any subgoal of /. 

Example. 

Consider using ForceSim to forcibly simulate the following recursive clause 
BOTTOM\(Dec) U L r 

append(Xs,Ys,Zs) <— 

component s(Xs, XI, Xsl)Acomponents(Ys,Yl,Ysl)Acomponents(Zs,Zl,Zsl) A 

null(Xs)A. . .Anull(Zsl)A 

odd(Xs)A. . .Aodd(Zsl)A 

equal(Xs,Xs)A. . . Aequal(Zsl,Zsl)A 

append(Xsl,Ys,Zsl) 

Here the recursive literal L r is append (Xsl, Ys,Zsl). We will also assume that / 
is taken from the extended query e = (/, D), which is again the flattened version 
of the instance append([l,2],[3],[l,2,3]) used in the previous example; that Dec 
is the set of declarations of in the previous example; and that the database DB 
is D U null(nul). 

After executing steps 1-4 of ForceSim, a number of failing literals are deleted, 
leading to the substitution 3 a of {Xs = [1,2], Ys = [3], Zs = [1,2,3], XI = 1, 
Xsl = [2], Yl = 3, Ysl = [], Zl = 1, Zsl = [2,3]} and the following reduced 
clause: 

append(Xs,Ys,Zs) <— 

component s(Xs, XI, Xsl)Acomponents(Ys,Yl,Ysl)Acomponents(Zs,Zl, Zsl) A 

null(Ysl)Aodd(Xl)Aodd(Yl)Aodd(Zl)Aequal(Xl,Zl)A 

append(Xsl,Ys,Zsl) 

Hence the recursive subgoal is 

L r a = append(Xsl , Ys,ZsF)a = append([2],[3],[2,3]) 

3. Note that for readability, we are using the term notation rather than the flattened notation of Xs = 112, 
Ys = 13, etc. 
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Recursively applying ForceSim to this goal produces the substitution {Xs = [2], 
Ys = [3], Zs = [2,3], XI = 2, Xsl = [], Yl = 3, Ysl = [], Zl = 2, Zsi = [3]} 
and also results in deleting the additional literals odd(Xl) and odd(Zl). The 
next recursive subgoal is L r a = append([], [3], [3]); since this clause is included 
in the database DB, ForceSim will terminate. The final clause returned by 
ForceSim in this case is the following: 

append(Xs,Ys,Zs) <— 

component s(Xs, XI, Xsl)Acomponents(Ys,Yl,Ysl)Acomponents(Zs,Zl,Zsl) A 

null(Ysl)Aodd(Yl)Aequal(Xl,Zl)A 

append(Xsl,Ys,Zsl) 

Notice that this clause does cover e. 

As in Section 3 we begin our analysis by showing the correctness of the forced simulation 
algorithm — i.e., by showing that forced simulation does indeed produce a unique maximally 
specific generalization of the input clause that covers the example. 

This proof of correctness uses induction on the depth of a proof. Let us introduce again 
some additional notation, and write P A DB \~h f if the Prolog program (P, DB) can be 
used to prove the fact / in a proof of depth h or less. (The notion of depth of a proof is the 
usual one; we will define looking up / in the database DB to be a proof of depth zero.) We 
have the following result concerning the ForceSim algorithm. 

Theorem 4 Let Dec be a declaration in VeW£C =1 , let DB be a database, let f be a fact, 
and let H be a determinate closed linear recursive clause that satisfies Dec. Then one of 
the following conditions must hold: 

• ForceSim(H, f, Dec, DB,h) returns FAILURE, and no recursive subclause H' of H 
satisfies both Dec and the constraint H' A DB \~h f; or, 

• ForceSim(H , f, Dec, DB , h) returns a clause H' , and H' is the unique syntactically 
largest recursive subclause of H that satisfies both Dec and the constraint H' ADB \~h f ■ 

Proof: Again to avoid repetition, we will refer to syntactically maximal recursive (non- 
recursive) subclauses H' of H that satisfy both Dec and the constraint H' A DB \~h f as 
"admissible recursive (nonrecursive) subclauses" respectively. 

The proof largely parallels the proof of Lemma 3 — in particular, similar arguments 
show that the clause returned by ForceSim satisfies the conditions of the theorem whenever 
FAILURE is returned and whenever H is returned. Note that the correctness of ForceSim 
when H is returned establishes the base case of the theorem for h = 0. 

For the case of depth h > 0, let us assume the theorem holds for depth h — 1 and 
proceed using mathematical induction. The arguments of Lemma 3 show that the following 
condition is true after the for loop terminates. 

Invariant 1'. H' is the unique maximal nonrecursive admissible subclause of H , and every 
{DB, /)-witness for H' is a superset of a. 
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begin algorithm Forcel(d, Dec, DB): 

% below BOTTOM * d is the most specific possible clause 

let L ri , . . - ,L r be all possible closed recursive literals for BOTTOM * d (Dec) 
choose an unmarked recursive literal T Tt 
let H <- BOTTOM* d (Dec) U {i r J 
repeat 

Ans <— answer to the query "Is H correct?" 

if Ans ="yes" then return H 

elseif A ns is a negative example e~ then 

H <- FAILURE 
elseif A ns is a positive example e + then 

% generalize H minimally to cover e + 

let (/, D) be the components of e + 

# <- ForceSim(H,f, Dec, (DB U L>), (a|D| + a| J D J B|)< 1 ') 
where a' is the arity of the clause head as given in Dec 

endif 

if H = FAILURE then 

if all recursive literals are marked then 
return "no consistent hypothesis" 

else 

mark L Tt 

choose an unmarked recursive literal L r 
let H <- BOTTOM* d (Dec) U {L rj } 
endif 
endif 
endrepeat 

end 

Figure 5: A learning algorithm for nonrecursive depth-c? determinate clauses 

Now, let us assume that there is some admissible recursive subclause H*. Clearly H* must 
contain the recursive literal L r of H , since L r is the only recursive literal of H . Further, 
the nonrecursive clause H = H* — {L r } must certainly satisfy Dec and also H A DB h /, 
so it must (by the maximality of H r ) be a subclause of H' . Hence H* must be a subclause 
of H' U {L r }. Finally, if L r a is ground (i.e., if L r is closed in the clause H' U L r ) then by 
Invariant 1', the clause H* must also satisfy H* A DB h L r a by a proof of depth h — 1. 
(This is simply equivalent to saying that the recursive subgoal of L r a generated in the proof 
must succeed.) 

By the inductive hypothesis, then, the recursive call must return the unique maximal 
admissible recursive subclause of H' U L r , which by the argument above must also be the 
unique maximal admissible recursive subclause of H . 

Thus by induction the theorem holds. ■ 
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4.3 A Learning Algorithm for Linear Recursive Clauses 

Given this method for generalizing recursive clauses, one can construct a learning al- 
gorithm for recursive clauses as follows. First, guess a recursive literal L r , and make 
H = BOTTOM d U L r the initial hypothesis of the learner. Then, ask a series of equivalence 
queries. After a positive counterexample e + , use forced simulation to minimally generalize 
H to cover e + . After a negative example, choose another recursive literal L' r , and reset the 
hypothesis to H = BOTTOM^ U L' r . 

Figure 5 presents an algorithm that operates along these lines. Let c?-DepthLinRec 
denote the language of linear closed recursive clauses of depth d or less. We have the 
following result: 

Theorem 5 For any constants a and d, the language family 

c?-DepthLinRec[X>,E> = , a-Veffl£C =1 ] 
is uniformly identifiable from equivalence queries. 

Proof: We will show that Force 1 uniformly identifies this language family with a polyno- 
mial number of queries. 

Correctness and query efficiency. There are at most a\\D\\ + a|_D_B| constants in 
any set DB U D, at most + a|_D_B|) a a'-tuples of such constants, and hence at most 

(a||_D|| + a|_D_B|) a distinct recursive subgoals L r a that might be produced in proving that a 
linear recursive clause C covers an extended instance (f,D). Thus every terminating proof 
of a fact / using a linear recursive clause C must be of depth + a|_D_B|) a or less; i.e., 

for h = (a\\D\\ + a|| J D J B||) a ', 

C A DB A D \~h f iff C A DB A D \- f 

Thus Theorem 4 can be strengthened: for the value of h used in Force 1, the subroutine 
ForceSim returns the syntactically largest subclause of H that covers the example (f,D) 
whenever any such a subclause exists, and returns FAILURE otherwise. 

We now argue the correctness of the algorithm as follows. Assume that the hypoth- 
esized recursive literal is "correct" — i.e., that the target clause C't is some subclause of 
BOTTOM * d U L r . In this case it is easy to see that Force 1 will identify C't, using an argu- 
ment that parallels the one made for Force 1 nr. Again by analogy to Force 1 nr, it is easy to 
see that only a polynomial number of equivalence queries will be made involving the correct 
recursive literal. 

Next assume that L r is not the correct recursive literal. Then C't need not be a subclause 
of BOTTOM * d U L r , and the response to an equivalence query may be either a positive or 
negative counterexample. If a positive counterexample e + is received and ForceSim is 
called, then the result may be FAILURE, or it may be a proper subclause of H that covers 
e + . Thus the result of choosing an incorrect L r will be a (possibly empty) sequence of 
positive counterexamples followed by either a negative counterexample or FAILURE. Since 
all equivalence queries involving the correct recursive literal will be answered by either 
a positive counterexample or "yes" 4 , then if a negative counterexample or FAILURE is 
obtained, it must be that L r is incorrect. 

4. Recall that an answer of "yes" to an equivalence query means the hypothesis is correct. 
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The number of variables in BOTTOM * d can be bounded by a\BOTTOM* d (Dec)\, and 
as each closed recursive literal is completely defined by an a'-tuple of variables, the number 
of possible closed recursive literals L r can be bounded by 

p = (a\\BOTTOM d (Dec)\\) a ' 

Since \BOTTOM* d (Dec)\ is polynomial in ||-Dec||, p is also polynomial in ||-Dec||. This means 
that only a polynomial number of incorrect i r 's need to be discarded. Further since each 
successive hypothesis using a single incorrect L r is a proper subclause of the previous hy- 
pothesis, only a polynomial number of equivalence queries are needed to discard an incorrect 
L r . Thus only a polynomial number of equivalence queries can be made involving incorrect 
recursive literals. 

Thus Force 1 needs only a polynomial number of queries to identify C±. 

Efficiency. ForceSim runs in time polynomial in its arguments H* , f, Dec, DB U D 
and h. When ForceSim is called from Force 1, h is always polynomial in n e and and 
H is always no larger than \BOTTOM* d (Dec)\ + 1, which in turn is polynomial in the size 
of Dec. Hence every invocation of ForceSim requires time polynomial in n e , Dec, and DB, 
and hence Force 1 processes each query in polynomial time. 

This completes the proof. ■ 

This result is somewhat surprising, as it shows that recursive clauses can be learned 
even given an adversarial choice of training examples. In contrast, most implemented ILP 
systems require well-choosen examples to learn recursive clauses. 

This formal result can also be strengthened in a number of technical ways. One of 
the more interesting strengthenings is to consider a variant of Force 1 that maintains a 
fixed set of positive and negative examples, and constructs the set of all least general 
clauses that are consistent with these examples: this could be done by taking each of the 
clauses BOTTOM * d U L ri , . . . , BOTTOM d U L r , forcibly simulating them on each of the 
positive examples in turn, and then discarding those clauses that cover one of more negative 
examples. This set of clauses could then be used to tractably encode the version space of 
all consistent programs, using the [S, N] representation for version spaces (Hirsh, 1992). 

5. Extending the Learning Algorithm 

We will now consider a number of ways in which the result of Theorem 5 can be extended. 

5.1 The Equality-Predicate and Unique-Mode Assumptions 

Theorem 5 shows that the language family 

d-DEPTKLmREc[VB= ,a-Veffl£C =1 ] 

is identifiable from equivalence queries. It is natural to ask if this result can be extended 
by dropping the assumptions that an equality predicate is present and that the declaration 
contains a unique legal mode for each predicate: that is, if the result can be extended to 
the language family 

<i- Depth Lin Rec a-VetVSC] 
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This extension is in fact straightforward. Given a database DB and a declaration Dec = 
(p,a',R) that do not satisfy the equality-predicate and unique-mode assumptions, one can 
modify them as follows. 

1. For every constant c appearing in DB, add the fact equal (c, c) to DB. 

2. For every predicate q that has k valid modes qs\, . . . , qsi~ in R: 

(a) remove the mode declarations for q, and replace them with k mode strings for 
the k new predicates q Sl , . . . , q Sk , letting q St s 4 - be the unique legal mode for the 
predicate q St ; 

(b) remove every fact q{t\, . . .,t a ) of the predicate q from DB, and replace it with 
the k facts q Sl (t 1 , . . . , t a ), q Sk (h, . . . , t a ). 

Note that if the arity of predicates is bounded by a constant a, then the number of modes 
k for any predicate q is bounded by the constant 2 a , and hence these transformations can 
be performed in polynomial time, and with only a polynomial increase in the size of Dec 
and DB. 

Clearly any target clause C't G rf-DEPTHLlNREc[_D5, Dec] is equivalent to some clause 
C[ e £?-DepthLinRec[D5', Dec'], where DB' and Dec' are the modified versions of DB 
and Dec constructed above. Using Force 1 it is possible to identify C" t . (In learning C" t , one 
must also perform steps 1 and 2b above on the description part D of every counterexample 
(f,D).) Finally, one can convert C" t to an equivalent clause in rf-DEPTHLlNREc[_D5, Dec] 
by repeatedly resolving against the clause equal(X,X)<—, and also replacing every predicate 
symbol q St with q. 

This leads to the following strengthening of Theorem 5: 

Proposition 6 For any constants a and d, the language family 

c?-Depth Lin Rec a-VetVSC] 
is uniformly identifiable from equivalence queries. 

5.2 The Datalog Assumption 

So far we have assumed that the target program contains no function symbols, and that the 
background knowledge provided by the user is a database of ground facts. While convenient 
for formal analysis, these assumptions can be relaxed. 

Examination of the learning algorithm shows that the database DB is used in only two 
ways. 

• In forcibly simulating a hypothesis on an extended instance (f,D), it is necessary to 
find a substitution a' that makes a literal L true in the database DB U D. While this 
can be done algorithmically if DB and D are sets of ground facts, it is also plausible 
to assume that the user has provided an oracle that answers in polynomial time any 
mode-correct query L to the database DB. Specifically, the answer of the oracle will 
be either 
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— the (unique) most-general substitution a' such that DB A D h La' and La' is 
ground; or 

— "no" if no such a' exists. 

Such an oracle would presumably take the form of an efficient theorem-prover for DB. 

• When calling ForceSim, the top-level learning algorithm uses DB and D to determine 
a depth bound on the length of a proof made using the hypothesis program. Again, 
it is reasonable to assume that the user can provide this information directly, in the 
form of an oracle. Specifically, this oracle would provide for any fact / a polynomial 
upper bound on the depth of the proof for / in the target program. 

Finally we note that if efficient (but non-ground) background knowledge is allowed, then 
function symbols always can be removed via flattening (Rouveirol, 1994). This transforma- 
tion also preserves determinacy, although it may increase depth — in general, the depth of 
a flattened clause depends also on term depth in the original clause. Thus, the assumption 
that the target program is in Datalog can be replaced by assumptions that the term depth 
is bounded by a constant, and that two oracles are available: an oracle that answers queries 
to the background knowledge, and a depth-bound oracle. Both types of oracles have been 
frequently assumed in the literature (Shapiro, 1982; Page & Frisch, 1992; Dzeroski et al., 
1992). 

5.3 Learning k-ary Recursive Clauses 

It is also natural to ask if Theorem 5 can be extended to clauses that are not linear recursive. 
One interesting case is the case of closed k-ary recursive clauses for constant k. It is 
straightforward to extend Force 1 to guess a tuple of k recursive literals L ri , . . . , L rk , and 
then to extend ForceSim to recursively generalize the hypothesis clause on each of the facts 
L ri a, . . ., L Tk a. The arguments of Theorems 4 and 5 can be modified to show that this 
extension will identify the target clause after a polynomial number of equivalence queries. 

Unfortunately, however, it is no longer the case that ForceSim runs in polynomial time. 
This is easily seen if one considers a tree of all the recursive calls made by ForceSim; in 
general, this tree will have branching factor k and polynomial depth, and hence exponential 
size. This result is unsurprising, as the implementation of ForceSim described forcibly 
simulates a depth-bounded top-down interpreter, and a k-ary recursive program can take 
exponential time to interpret with such an interpreter. 

There are at least two possible solutions to this problem. One possible solution is to 
retain the simple top-down forced simulation procedure, and require the user to provide 
a depth bound tighter than (a||-D|| + a|_D_B|) a , the maximal possible depth of a tree. For 
example, in learning a 2-ary recursive sort such as quicksort, the user might specify a log- 
arithmic depth bound, thus guaranteeing that ForceSim is polynomial-time. This requires 
additional input from the user, but would be easy to implement. It also has the advantage 
(not shared by the approach described below) that the hypothesized program can be exe- 
cuted using a simple depth-bounded Prolog interpreter, and will always have shallow proof 
trees. This seems to be a plausible bias to impose when learning k-ary recursive Prolog 
programs, as many of these tend to have shallow proof trees. 
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A second solution to the possible high cost of forced simulation for k-ary recursive 
programs is to forcibly simulate a "smarter" type of interpreter — one which can execute 
k-Suiy recursive program in polynomial time. 5 One sound and complete theorem-prover for 
closed k-Suiy recursive programs can be implemented as follows. 

Construct a top-down proof tree in the usual fashion, i.e., using a depth-first left-to-right 
strategy, but maintain a list of the ancestors of the current subgoal, and also a list VISITED 
that records, for each previously visited node in the tree, the subgoal associated with that 
node. Now, suppose that in the course of constructing the proof tree one generates a subgoal 
/ that is on the VISITED list . Since the traversal of the tree is depth-first left- to-right, the 
node associated with / is either an ancestor of the current node, or is a descendant of some 
left sibling of an ancestor of the current node. In the former case, the proof tree contains 
a loop, and cannot produce a successful proof; in this case the theorem-prover should exit 
with failure. In the latter case, a proof must already exist for /', and hence nodes below the 
current node in the tree need not be visited; instead the theorem prover can simply assume 
that / is true. 

This top-down interpreter can be easily extended into a forced simulation procedure: 
one simply traverses the tree in the same order, generalizing the current hypothesis H as 
needed to justify each inference step in the tree. The only additional point to note is that 
if one is performing forced simulation and revisits a previously proved subgoal / at a node 
n, the current clause H need not be further generalized in order to prove /, and hence it is 
again permissible to simply skip the portion of the tree below n. We thus have the following 
result. 

Theorem 7 Let c?-Depth-A;-Rec be the set of k-ary closed recursive clauses of depth d. 
For any constants a, d, and k the language family 

d-BEPTE-k-REc[VB, a-VetVeC] 

is uniformly identifiable from equivalence queries. 

Proof: Omitted, but following the informal argument made above. ■ 

Note that we give this result without the restrictions that the database contains an 
equality relation and that the declaration is unique-mode, since the tricks used to relax 
these restrictions in Proposition 6 are still applicable. 

5.4 Learning Recursive and Base Cases Simultaneously 

So far, we have analyzed the problem of learning single clauses: first a single nonrecursive 
clause, and then a single recursive clause. However, every useful recursive program contains 
at least two clauses: a recursive clause, and a nonrecursive base case. It is natural to ask 
if it is possible to learn a complete recursive program by simultaneously learning both a 
recursive clause, and its associated nonrecursive base case. 

In general, this is not possible, as is demonstrated elsewhere (Cohen, 1995). However, 
there are several cases in which the positive result can be extended to two-clause programs. 

5. Note that it is plausible to believe that such a theorem-prover exists, as there are only a polynomial 
number of possible theorem-proving goals — namely, the (a|Z)| + a||D_B||) a possible recursive subgoals. 
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gin algorithm Force2(d, Dec, DB): 
let L ri , . . - ,L r be all possible recursive literals for BOTTOM d (Dec) 
choose an unmarked recursive literal L Tt 
let H* R <- BOTTOM* d (Dec) U {L n } 
let H* B <- BOTTOM* d (Dec) 
\etP = (H R ,H b ) 
repeat 

^4ns <— answer to query "Is H R ,H B correct?" 
if ^4ns ="yes" then return Hg 
elseif A ns is a negative example e~ then 

P <- FAILURE 
elseif A ns is a positive example e + then 

let (/, D) be the components of e + 

P <- ForceSim2(H* R , H%,f, Dec, (DB U P), (a\\D\\ + a\\DB\\) 
endif 

if P = FAILURE then 

if all recursive literals L T are marked then 
return "no consistent hypothesis" 

else 

mark L Tt 

choose an unmarked recursive literal L r 
let H* R <- BOTTOM* d (Dec) U {P rj } 
let H* B <- BOTTOM d (Dec) 
\etP = (H* R ,H* B ) 
endif 
endif 
endrepeat 

d 



Figure 6: A learning algorithm for two-clause recursive programs 
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begin subroutine ForceSim2{Hn, Hs,f , Dec, DB , h): 
% "forcibly simulate" program Hr,Hr on f 
iih<l then return FAILURE 
% check to see if f should be covered by Hb 
elseif BASECASE(f) then 

return current H r and generalized Hb 

return (Hr, ForceSimNn(H r , f , Dec, DB)) 
elseif the head of Hr and / cannot be unified then 

return FAILURE 

else 

let L r be the recursive literal of Hr 

let H' <- H - {L r } 

let A be the head of H' 

let a be the mgu of A and e 

for each literal L in the body of H' do 

if there is a substitution a 1 such that Lao' G DB 

then a <— a o a', where a' is the most general such substitution 

else 

delete L from the body of H' , together with 
all literals V supported (directly or indirectly) by L 

endif 
endfor 

% generalize H' ,Hr on the recursive subgoal L r a 
if L r a is ground then 

% continue the simulation of the program 
return ForceSim2(H' U {L r }, Hb, L r a, Dec, DB, h — 1) 
else return FAILURE 
endif 
endif 

end 



Figure 7: Forced simulation for two- clause recursive programs 



529 



Cohen 



In this section, we will first discuss learning a recursive clause and base clause simultane- 
ously, assuming that any determinate base clause is possible, but also assuming that an 
additional "hint" is available, in the form of a special "basecase" oracle. We will then 
discuss various alternative types of "hints". 

Let P be a target program with base clause Cr and recursive clause Cr. A basecase 
oracle for P takes as input an extended instance (/, D) and returns "yes" if CrADB AD h /, 
and "no" otherwise. In other words, the oracle determines if / is covered by the nonrecursive 
base clause alone. As an example, for the append program, the basecase oracle should return 
"yes" for an instance append(Xs, Ys,Zs) when Xs is the empty list, and "no" otherwise. 

Given the existence of a basecase oracle, the learning algorithm can be extended as 
follows. As before, all possible recursive literals L Tt of the clause BOTTOM * d are generated; 
however, in this case, the learner will test two clause hypotheses that are initially of the 
form (BOTTOM * d U L ri , BOTTOM d ). To forcibly simulate such a hypothesis on a fact /, 
the following procedure is used. After checking the usual termination conditions, the forced 
simulator checks to see if BASECASE(f) is true. If so, it calls ForceSim^R (with appropriate 
arguments) to generalize the current hypothesis for the base case. If BASECASE(f) is 
false, then the recursive clause H r is forcibly simulated on /, a subgoal L r a is generated 
as in before, and the generalized program is recursively forcibly simulated on the subgoal. 
Figures 6 and 7 present a learning algorithm Force2 for two clause programs consisting of 
one linear recursive clause Cr and one nonrecursive clause Cr, under the assumption that 
both equivalence and basecase oracles are available. 

It is straightforward to extend the arguments of Theorem 5 to this case, leading to the 
following result. 

Theorem 8 Let c?-Depth-^-Clause be the set of 2-clause programs consisting of one 
clause in c?-DepthLinRec and one clause in c?-DepthNonRec. For any constants a 
and d the language family 

d-DEPTH-2-CLAUSE[£>£, a-VetVSC] 
is uniformly identifiable from equivalence and basecase queries. 

Proof: Omitted. ■ 

A companion paper (Cohen, 1995) shows that something like the basecase oracle is 
necessary: in particular, without any "hints" about the base clause, learning a two-clause 
linear recursive program is as hard as learning boolean DNF. However, there are several 
situations in which the basecase oracle can be dispensed with. 

Case 1. The basecase oracle can be replaced by a polynomial-sized set of possible base 
clauses. The learning algorithm in this case is to enumerate pairs of base clauses Cr z 
and "starting clauses" BOTTOM* U L rj , generalize the starting clause with forced 
simulation, and mark a pair as incorrect if overgeneralization is detected. 

Case 2. The basecase oracle can be replaced by a fixed rule that determines when the base 
clause is applicable. For example, consider the rule that says that the base clause is 
applicable to any atom p(X\, . . - ,X a ) such that no X{ is a non-null list. Adopting 
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such a rule leads immediately to a learning procedure that pac-learns exactly those 
two-clause linear recursive programs for which the rule is correct. 

Case 3. The basecase oracle can be also be replaced by a polynomial-sized set of rules for 
determining when a base clause is applicable. The learning algorithm in this case is 
pick a unmarked decision rule and run Force2 using that rule as a basecase oracle. If 
Force2 returns "no consistent hypothesis" then the decision rule is marked incorrect, 
and a new one is choosen. This algorithm will learn those two-clause linear recursive 
programs for which any of the given decision rules is correct. 

Even though the general problem of determining a basecase decision rule for an arbitrary 
Datalog program may be difficult, it may be that a small number of decision procedures 
apply to a large number of common Prolog programs. For example, the recursion for most 
list-manipulation programs halts when some argument is reduced to a null list or to a 
singleton list. Thus Case 3 above seems likely to cover a large fraction of the automatic 
logic programming programs of practical interest. 

We also note that heuristics have been proposed for finding such basecase decision rules 
automatically using typing restrictions (Stahl, Tausend, & Wirth, 1993). 

5.5 Combining the Results 

Finally, we note that all of the extensions described above are compatible. This means 
that if we let A;c?-MaxRecLang be the language of two-clause programs consisting of one 
clause Cr that is k-ary closed recursive and depth-c? determinate, and one clause C'b that 
is nonrecursive and depth-c? determinate, then the following holds. 

Proposition 9 For any constants a, k and d the language family 

M-MaxRecLang[£>,8, a-VetVSC] 

is uniformly identifiable from equivalence and basecase queries. 

5.5.1 Further Extensions 

The notation A;c?-MaxRecLang may seem at this point to be unjustified; although it is the 
most expressive language of recursive clauses that we have proven to be learnable, there are 
numerous extensions that may be efficiently learnable. For example, one might generalize 
the language to allow an arbitrary number of recursive clauses, or to include clauses that are 
not determinate. These generalizations might very well be pac-learnable — given the results 
that we have presented so far. 

However, a companion paper (Cohen, 1995) presents a series of negative results showing 
that most natural generalizations of A;c?-MaxRecLang are not efficiently learnable, and 
further that A;c?-MaxRecLang itself is not efficiently learnable without the basecase or- 
acle. Specifically, the companion paper shows that eliminating the basecase oracle leads 
to a problem that is as hard as learning boolean DNF, an open problem in computational 
learning theory. Similarly, learning two linear recursive clauses simultaneously is as hard 
as learning DNF, even if the base case is known. Finally, the following learning problems 
are all as hard as breaking certain (presumably) secure cryptographic codes: learning n 
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linear recursive determinate clauses, learning one ra-ary recursive determinate clause, or 
learning one linear recursive "&-local" clause. All of these negative results hold not only 
for the model of identification from equivalence queries, but also for the weaker models of 
pac-learnability and pac-predictability. 

6. Related Work 

In discussing related work we will concentrate on previous formal analyses that employ a 
learning model similar to that considered here: namely, models that (a) require all compu- 
tation be polynomial in natural parameters of the problem, and (b) assume either a neutral 
source or adversarial source of examples, such as equivalence queries or stochastically pre- 
sented examples. We note, however, that much previous formal work exists that relies on 
different assumptions. For instance, there has been much work in which member or subset 
queries are allowed (Shapiro, 1982; De Raedt & Bruynooghe, 1992), or where examples are 
choosen in some non-random manner that is helpful to the learner (Ling, 1992; De Raedt 
& Dzeroski, 1994). There has also been some work in which the efficiency requirements 
imposed by the pac-learnability model are relaxed (Nienhuys-Cheng & Polman, 1994). If 
the requirement of efficiency is relaxed far enough, very general positive results can be ob- 
tained using very simple learning algorithms. For example, in model of learnability in the 
limit (Gold, 1967), any language that is both recursively enumerable and decidable (which 
includes all of Datalog) can be learned by a simple enumeration procedure; in the model 
of U-learnability (Muggleton & Page, 1994) any language that is polynomially enumerable 
and polynomially decidable can be learned by enumeration. 

The most similar previous work is that of Frazier and Page (1993a, 1993b). They analyze 
the learnability from equivalence queries of recursive programs with function symbols but 
without background knowledge. The positive results they provide are for program classes 
that satisfy the following property: given a set of positive examples S + that requires all 
clauses in the target program to prove the instances in S + , only a polynomial number of 
recursive clauses are possible; further the base clause must have a certain highly constrained 
form. Thus the concept class is "almost" bounded in size by a polynomial. The learning 
algorithm for such a program class is to interleave a series of equivalence queries that 
test every possible target program. In contrast, our positive results are for exponentially 
large classes of recursive clauses. Frazier and Page also present a series of negative results 
suggesting that the learnable languages that they analyzed are difficult to generalize without 
sacrificing efficient learnability. 

Previous results also exist on the pac-learnability of nonrecursive constant-depth de- 
terminate programs, and on the pac-learnability of recursive constant-depth determinate 
programs in a model that also allows membership and subset queries (Dzeroski et al., 
1992). 

The basis for the intelligent search used in our learning algorithms is the technique 
of forced simulation. This method finds the least implicant of a clause C that covers 
an extended instance e. Although when we developed this method we believed it to be 
original, subsequently we discovered that this was not the case — an identical technique had 
been previously proposed by Ling (1991). Since an extended instance e can be converted 
(via saturation) to a ground Horn clause, there is also a close connection between forced 
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simulation and recent work on "inverting implication" and "recursive anti-unification"; for 
instance, Muggleton (1994) describes a nondeterministic procedure for finding all clauses 
that imply a clause C, and Idestam-Almquist (1993) describes a means of constraining such 
an implicant-generating procedure to produce the least common implicant of two clauses. 
However, while both of these techniques have obvious applications in learning, both are 
extremely expensive in the worst case. 

The CRUSTACEAN system (Aha et al., 1994) uses inverting implication in constrained 
settings to learn certain restricted classes of recursive programs. The class of programs 
efficiently learned by this system is not formally well-understood, but it appears to be 
similar to the classes analyzed by Frazier and Page. Experimental results show that these 
systems perform well on inferring recursive programs that use function symbols in certain 
restricted ways. This system cannot, however, make use of background knowledge. 

Finally, we wish to direct the reader to several pieces of our own research that are rele- 
vant. As noted above, a companion paper exists which presents negative learnability results 
for several natural generalizations of the language A;c?-MaxRecLang (Cohen, 1995). An- 
other related paper investigates the learnability of non-recursive Prolog programs (Cohen, 
1993b); this paper also contains a number of negative results which strongly motivate the 
restriction of constant-depth determinacy. A final prior paper which may be of interest 
presents some experimental results with a Prolog implementation of a variant of the Force2 
algorithm (Cohen, 1993a). This paper shows that forced simulation can be the basis of a 
learning program that outperforms state-of-the art heuristic methods such as FOIL (Quin- 
lan, 1990; Quinlan & Cameron- Jones, 1993) in learning from randomly chosen examples. 

7. Conclusions 

Just as it is often desirable to have guarantees of correctness for a program, in many 
plausible contexts it would be highly desirable to have an automatic programming system 
offer some formal guarantees of correctness. The topic of this paper is the learnability of 
recursive logic programs using formally well-justified algorithms. More specifically, we have 
been concerned with the development of algorithms that are provably sound and efficient in 
learning recursive logic programs from equivalence queries. We showed that one constant- 
depth determinate closed k-ary recursive clause is identifiable from equivalent queries; this 
implies immediately that this language is also learnable in Valiant's (1984) model of pac- 
learnability. We also showed that a program consisting of one such recursive clause and 
one constant-depth determinate nonrecursive clause is identifiable from equivalence queries 
given an additional "basecase oracle" , which determines if a positive example is covered by 
the non-recursive base clause of the target program alone. 

In obtaining these results, we have introduced several new formal techniques for an- 
alyzing the learnability of recursive programs. We have also shown the soundness and 
efficiency of several instances of generalization by forced simulation. This method may have 
applications in practical learning systems. The Force2 algorithm compares quite well ex- 
perimentally with modern ILP systems on learning problems from the restricted class that 
it can identify (Cohen, 1993a); thus sound learning methods like Force2 might be useful as 
a filter before a more general ILP system like FOIL (Quinlan, 1990; Quinlan & Cameron- 
Jones, 1993). Alternatively, forced simulation could be used in heuristic programs. For 
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example, although forced simulation for programs with many recursive clauses is nondeter- 
ministic and hence potentially inefficient, one could introduce heuristics that would make 
the forced simulation efficient, at the cost of completeness. 

A companion paper (Cohen, f 995) shows that the positive results of this paper are not 
likely to be improved: either eliminating the basecase oracle for the language above or 
learning two recursive clauses simultaneously is as hard as learning DNF, and learning n 
linear recursive determinate clauses, one ra-ary recursive determinate clause, or one linear 
recursive "&-local" clause is as hard as breaking certain cryptographic codes. With the pos- 
itive results of this paper, these negative results establish the boundaries of learnability for 
recursive programs function-free in the pac-learnability model. These results thus not only 
give a prescription for building a formally justified system for learning recursive programs; 
taken together, they also provide upper bounds on what one can hope to achieve with an 
efficient, formally justified system that learns recursive programs from random examples 
alone. 

Appendix A. Additional Proofs 

Theorem f states: Let Dec = (p,a',R) be a declaration in £ a-VeW8C = , let n r = \\R\\, let 
X\, . . .,X a i be distinct variables, and define the clause BOTTOM * d as follows: 

BOTTOM* d (Dec) = CONSTRAIN Dec (DEEPEN d Dec {p(X 1 , . . ., X a /)<-)) 

For any constants d and a, the following are true: 

• the size of BOTTOM * d (Dec) is polynomial in n r ; 

• every depth-d clause that satisfies Dec is equivalent to some subclause of 
BOTTOM* d (Dec). 

Proof: Let us first establish the polynomial bound on the size of BOTTOM d . Let C be a 
clause of size n. As the number of variables in C is bounded by an, the size of the set Cb 
is bounded by 

n r ■ (an) a ~ 1 
(# modes) (# tuples of input variables) 

Thus for any clause C 

I DEEPEN Dec{C)\ <n+ (<m) a_ V (f) 

By a similar argument 

\\CONSTRAIN Dec {C)\\ <n + (an) a n r (2) 

Since both of the functions DEEPEN Oec an( i CONSTRAIN o e c gi ye outputs that are poly- 
nomially larger in size than their inputs, if follows that composing these functions a constant 
number of times, as was done in computing BOTTOM * d for constant d, will also produce 
only a polynomial increase in the size. 

Next, we wish to show that every depth-d determinate clause C that satisfies Dec is 
equivalent to some subclause of BOTTOM d . Let C be some depth-d determinate clause, 
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and without loss of generality let us assume that no pair of literals Li and Lj in the body 
of C have the same mode, predicate symbol, and sequence of input variables. 6 
Given C, let us now define the substitution 6c as follows: 

1. Initially set 

6 C <— {X{ = X 1 , . . . , X*, = X a t} 

where Xj*, . . . , X*, are the arguments to the head of BOTTOM * d and X\, . . . , X a i are 
the arguments to the head of C . 

Notice that because the variables in the head of BOTTOM * d are distinct, this mapping 
is well-defined. 

2. Next, examine each of the literals in the body of C in left-to-right order. For each 
literal L, let variables T\, . . .Tk be its input variables. For each literal L* in the 
body BOTTOM * d with the same mode and predicate symbol whose input variables 
T£, . . . ,T% are such that Vi : 1 < i < r, Tj6c = Tj, modify 6c as follows: 

6 C ^e c u{u; = u 1 ,...,ui' = u,} 

where U\, . . . , Ui are the output variables of L and Uf, . . . ,U* are the output variables 
of L*. 

Notice that because we assume that C contains only one literal L with a given pred- 
icate symbol and sequence of input variables, and because the output variables of 
literals L* in BOTTOM * d are distinct, this mapping is again well-defined. It is also 
easy to verify (by induction on the length of C) that in executing this procedure some 
variable in BOTTOM * d is always mapped to each input variable T 4 -, and that at least 
one L* meeting the requirements above exists. Thus the mapping 6c is onto the 
variables appearing in C. 7 

Let A* be the head of BOTTOM d , and consider the clause C" which is defined as follows: 

• The head of C is A*. 

• The body of C" contains all literals L* from the body of BOTTOM * d such that either 

- L*6c is in the body of C 

- L* is the literal equal(X*,X*) and X*6 C = X*6 C . 

We claim that C" is a subclause of BOTTOM * d that is equivalent to C . Certainly C" 
is a subclause of BOTTOM d . One way to see that it is equivalent to C is to consider 
the clause C and the substitution 6c which are generated as follows. Initially, let C = C" 
and let 6c = 6c- Then, for every literal L = equal (X* , XJ) in the body of C, delete L 
from C, and finally replace C with Cuij and replace 6c with (6c)(Jij, where <7 8 j is the 
substitution {X* = X*j,X* = X*j} and Xij is some new variable not previously appearing 

6. This assumption can be made without loss of generality since for a determinate clause C, the output 
variables of L t and Lj will necessarily be bound to the same values, and hence L t or Lj could be unified 
together and one of them deleted without changing the semantics of C . 

7. Recall that a function / : X <— Y is onto its range Y if Vj/ £ Y3x £ X : f(x) = y. 
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in C. (Note: by (0c) (J ij we refer to the substitution formed by replacing every occurrence 
of Xi or Xj appearing in 0c with Xij.) C is semantically equivalent to C" because the 
operation described above is equivalent to simply resolving each possible L in the body of 
C" against the clause LL equal (X,X)<— V . 

The following are now straightforward to verify: 

• 0c is a one-to-one mapping. 

To see that this is true, notice that for every pair of assignments X* = Y and X* = 
Y in 0c there must be a literal equal(X* , X?) in C". Hence following the process 
described above the assignments X* = Y and X* = Y in 0c would eventually be 
replaced with X*- = Y and X*- = Y. 

• 0c is onto the variables in C . 

Notice that 0c was onto the variables in C, and for every assignment X* = Y in 0c 
there is some assignment in 0c with a right-hand side of Y (and this assignment is 
either of the form X* = Y or X*- = Y). Thus 0c is also onto the variables in C . 

• A literal L is in the body of C iff L0c is in the body of C . 

This follows from the definition of C" and from the fact that for every literal L* from 
C" that is not of the form equal(X*, X?) there is a corresponding literal in C. 

Thus C is an alphabetic variant of C, and hence is equivalent to C . Since C is also equivalent 
to C, it must be that C" is equivalent to C, which proves our claim. ■ 
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